Back to index

Making Software

Tags: #technology #software engineering #research #evidence-based #development

Authors: Andy Oram, Greg Wilson

Overview

This book is about the quest for convincing evidence in software engineering. What makes evidence convincing, how can we collect and analyze it, and how can we use it to make better decisions about our software development practices? We begin by exploring the nature of evidence itself. What makes a piece of evidence strong, reliable, and relevant? We then delve into a variety of specific topics in software engineering, including the effectiveness of test-driven development, the importance of good bug reports, the impact of organizational structure on software quality, the challenges of learning to program, the differences between open source and proprietary software, and the role of design patterns in software development. For each topic, we present the available evidence, explain how it was collected and analyzed, and discuss the implications for software developers. Our goal is to help you become a more critical and informed consumer of software engineering research, so that you can make better decisions about your work. This book is written for practicing software developers, but it will also be of interest to researchers, students, and anyone else who wants to learn more about how to build better software. All author royalties from this book will be donated to Amnesty International.

Book Outline

1. The Quest for Convincing Evidence

This chapter sets the scene for the book by exploring what constitutes “convincing evidence” in software engineering. It argues that merely elegant, statistically strong, and replicable evidence, though valuable in itself, does not always have the desired impact on improving real-world practices. Therefore, the chapter proposes a more practical definition of convincing evidence as one that “motivates change.”

Key concept: Results are far more convincing when they’re found again and again in many different contexts—i.e., not limited to one context or set of experimental conditions. In other sciences, replication builds confidence, and for this reason much effort has been expended to make software engineering experiments easy to rerun by other researchers in other contexts [Basili et al. 1999].

2. Credibility, or Why Should I Insist on Being Convinced?

This chapter delves into the notion of “credibility” in software engineering evidence and explores how to think critically about the information that emerges over time from various sources, such as experience, colleagues, reflection, reading, and research. It highlights the importance of distinguishing between “credible” evidence that is well-founded and “relevant” evidence that is pertinent to the specific questions we seek to answer.

Key concept: Software development is similar: evidence emerges over time, and the quality of the engineering hinges on the critical faculties of the engineer. If we succumb to confirmatory bias, we pay most attention to evidence that confirms our views.

3. What We Can Learn from Systematic Reviews

This chapter advocates for the use of systematic reviews (SRs) as a rigorous methodology for aggregating evidence from different empirical studies to support informed decision making in software engineering. It emphasizes that relying solely on “common knowledge” and expert opinion can be problematic, as experts can be wrong, and informal reviews can overlook or misinterpret important studies.

Key concept: The major advantage of an SR is that it is based on a well-defined methodology.

4. Understanding Software Engineering Through Qualitative Methods

This chapter focuses on the importance of qualitative methods for understanding software engineering phenomena that are not easily quantifiable. It argues that while numbers are essential in many aspects of computing, they don’t answer all the “why” and “how” questions that arise in software development.

Key concept: Put simply, qualitative methods entail the systematic gathering and interpretation of nonnumerical data (including words, pictures, etc.).

5. Learning Through Application: The Maturing of the QIP in the SEL

This chapter recounts the experiences and lessons learned from the NASA Software Engineering Laboratory (SEL), highlighting the importance of an evolutionary, iterative approach to empirical research in software engineering. It introduces the Quality Improvement Paradigm (QIP) as a framework for applying the scientific method in an industrial context, emphasizing the need for continual learning and adaptation based on feedback from practice.

Key concept: The Quality Improvement Paradigm is a double-loop process, as shown by Figure 5-1. Research interacts with practice, represented by project learning and corporate learning based upon feedback from application of the ideas.

6. Personality, Intelligence, and Expertise: Impacts on Software Development

This chapter delves into the factors that impact software development, exploring the roles of personality, intelligence, and expertise in shaping individual and team performance. It examines the debate between focusing on fixed individual traits (personality, intelligence) versus malleable ones (skills, expertise), and considers the implications for hiring, team formation, and the use of tools and techniques.

Key concept: Characteristics that separate one individual from another—or individual differences, as researchers call them for short—can be classified along a continuum from fixed to malleable.

7. Why Is It So Hard to Learn to Program?

This chapter investigates the challenges inherent in learning to program. It examines research findings that reveal high failure rates in introductory programming courses and the difficulties students face in grasping fundamental programming concepts. The chapter explores whether these difficulties are inherent in the activity itself or stem from inadequacies in teaching methods or tools.

Key concept: Most of our studies point more toward how complex it is for humans to learn how to program a computer.

8. Beyond Lines of Code: Do We Need More Complexity Metrics?

This chapter explores the limitations of traditional code complexity metrics, such as lines of code and cyclomatic complexity, by examining their correlation with other size and complexity measures in a large corpus of open-source C code. The findings suggest that although these metrics are widely used, they may not provide much more information than simple size metrics like lines of code.

Key concept: In our opinion, there is a clear lesson from this study: syntactic complexity metrics cannot capture the whole picture of software complexity.

9. An Automated Fault Prediction System

This chapter presents a model for automated fault prediction in large software systems, based on empirical analysis of past releases. By analyzing code properties, process properties, and fault counts from earlier releases, the model identifies files that are most likely to have defects in the next release, allowing for focused testing efforts and resource allocation.

Key concept: Accurate prediction of the parts of a system that are most likely to have faults can provide developers, testers, and managers with a large head start on finding problems, increase the efficiency of testing, and help make the most of resources that are usually in short supply.

10. Architecting: How Much and When?

This chapter delves into the question of “how much architecting is enough?” for software projects. It presents research findings on the cost-to-fix growth evidence, showing that the cost of fixing defects increases significantly as a project progresses. It highlights the value of architecting and risk resolution in reducing rework and proposes a framework for determining the optimal architecting investment for different project sizes and types.

Key concept: In this context, “architecting” does not refer to producing the equivalent of blueprints for the software, but to the overall set of concurrent frontend activities (site surveys, operations analysis, needs and opportunities analysis, economic analysis, requirements and architecture definition, planning and scheduling, verifying and validating feasibility evidence) that are key to creating and sustaining a successful building or software product, as elaborated in the book Systems Architecting [Rechtin 1991].

11. Conway’s Corollary

This chapter explores the validity of “Conway’s Law”, which states that the design of a system will mirror the communication structure of the organization that created it. It presents empirical evidence from studies of both industrial and open source projects to show how organizational structure impacts software architecture and developer productivity.

Key concept: Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.

12. How Effective Is Test-Driven Development?

This chapter examines the effectiveness of Test-Driven Development (TDD) through a systematic review of research studies. Using the analogy of a medical pill, it explores the impacts of TDD on internal code quality, external quality, productivity, and test quality, presenting a summary of clinical trials and discussing the potential benefits and drawbacks of this agile practice.

Key concept: In this chapter, we treat TDD as an imaginary medical pill and describe its effects with a narrative from a pharmacological point of view, instead of providing a formal systematic review report. We invite the reader to imagine that the rest of this chapter is a medical fact sheet for the TDD “pill” and to continue reading with the following question in mind: “If TDD were a pill, would you take it to improve your health?

13. Why Aren’t More Women in Computer Science?

This chapter investigates the reasons behind the underrepresentation of women in computer science. It explores three major viewpoints: ability deficits in women, lack of interest in computer science among women, and cultural biases and stereotypes that direct women away from the field. The chapter reviews research findings and explores possible interventions to address this issue.

Key concept: Curiously, 23 years earlier (in 1985), 37% of computer science bachelor’s degrees were awarded to women [National Center for Education Statistics 2008]. Between 2001 and 2008 alone, there was a 79% decline in the number of incoming undergraduate women interested in majoring in computer science [Higher Education Research Institute 2008].

14. Two Comparisons of Programming Languages

This chapter presents the findings of two studies that compare different programming languages in terms of execution speed, memory consumption, programmer productivity, code structure, and reliability. The author emphasizes that the studies, though limited in scope, challenge common beliefs about language superiority and highlight the influence of language-specific cultures on programmer thinking and design approaches.

Key concept: The language used really shaped the programmers’ thinking here—even though many of them also knew a language of the other type.

15. Quality Wars: Open Source Versus Proprietary Software

This chapter investigates the long-standing debate of open-source versus proprietary software quality by examining code quality metrics from four large-scale operating systems: FreeBSD, Linux, OpenSolaris, and the Windows Research Kernel (WRK). The findings reveal that there are no significant across-the-board code quality differences between these systems, suggesting that the engineering requirements drive code quality more than the development process.

Key concept: At the very least, the results indicate that the structure and internal quality attributes of a large and complex working software artifact, will represent first and foremost the formidable engineering requirements of its construction, with the influence of process being marginal, if any.

16. Code Talkers

This chapter explores the critical role of communication in software development. Through observational studies and surveys of professional programmers, the author reveals that programmers spend a significant portion of their time communicating with colleagues, primarily seeking information or clarifications that are not readily available in documentation. The chapter highlights the importance of understanding the “rationale” behind code, which is often only found in the team’s collective memory.

Key concept: Often, examining the code and its behavior is not enough to reach a full understanding. In the story I cited in the previous section, the interviewed programmer turned to colleagues to learn the rationale behind the code—a kind of information that is not typically recorded, but exists only in the team’s collective memory.

17. Pair Programming

This chapter delves into the practice of pair programming, exploring its history, benefits, and challenges in both industrial and educational settings. It highlights the empirical evidence supporting the effectiveness of pair programming in improving code quality, reducing defects, and enhancing team communication, knowledge sharing, and learning.

Key concept: Pair programming is a style of programming in which two programmers work side-by-side at one computer, continuously collaborating on the same design, algorithm, code, or test.

18. Modern Code Review

This chapter discusses the best practices and techniques for performing efficient code reviews, emphasizing the value of careful code inspection in detecting defects and improving software quality. The author presents empirical evidence showing that code reviews can be more effective than testing in finding bugs, but only when done properly, considering factors such as focus fatigue, review speed, and the context of the code being reviewed.

Key concept: In our knowledge of how to perform code reviews efficiently.

19. A Communal Workshop or Doors That Close?

This chapter investigates the impact of office space layout on the productivity and collaboration of collocated software teams. Exploring the trade-off between isolation and communication, the author presents evidence supporting both private offices that minimize distractions and communal workshops that foster interaction. The chapter suggests that the best approach depends on the specific work patterns and needs of the team.

Key concept: Unfortunately, research still has not provided a clear guideline to making a choice in the particular case of software development teams.

20. Identifying and Managing Dependencies in Global Software Development

This chapter focuses on the critical role of managing dependencies in global software development (GSD). Drawing on empirical research and case studies, the author introduces the concept of socio-technical congruence (STC), which emphasizes aligning coordination needs arising from technical dependencies with the coordination capabilities provided by the project’s social and organizational structure.

Key concept: STC refers to the relationships between the coordination needs that emerge from the technical context of a development project and the coordination capabilities provided by the socio-organizational structure of the project.

21. How Effective Is Modularization?

This chapter examines the effectiveness of modularization in software development by analyzing code changes and dependencies in three large-scale open source systems. The findings suggest that while modularity offers many benefits, developers often need to consider and reason about more of the system than desired due to the limitations of existing modularization techniques. The chapter concludes with recommendations for improving tool support and language features to enhance modularization practices.

Key concept: Modules are thus an entrenched organizational approach to constructing software.

22. The Evidence for Design Patterns

This chapter explores the empirical evidence for the effectiveness of design patterns in software development. The author presents a series of experiments conducted to evaluate the impact of using design patterns on programmer productivity, code quality, and team communication. The findings reveal that while design patterns can be beneficial, their effectiveness depends on factors such as programmer experience, task complexity, and the specific patterns used.

Key concept: Design patterns are reusable solutions for design problems.

23. Evidence-Based Failure Prediction

This chapter presents a comprehensive overview of evidence-based techniques for predicting failures in large-scale software systems, drawing on empirical studies conducted on the Windows operating system family. The chapter discusses various internal metrics, including code coverage, code churn, code complexity, code dependencies, people and organizational metrics, and an integrated/combined approach, providing insights into their effectiveness and limitations for predicting failures.

Key concept: In this chapter we discuss six different sets of metrics for failure prediction.

24. The Art of Collecting Bug Reports

This chapter examines the art of collecting bug reports, emphasizing the importance of high-quality information for effective bug fixing. Through surveys of developers and bug reporters from large open source projects, the author identifies the most valuable information items and common problems in bug reports, and discusses the impact of duplicates and readability on bug resolution time.

Key concept: The quality of information in bug reports can crucially influence the resolution of a bug as well as its resolution time.

25. Where Do Most Software Flaws Come From?

This chapter investigates the common sources of software flaws in large-scale, real-time systems. Drawing on empirical data collected through surveys of developers, the author categorizes faults based on their origin (requirements, design, coding, testing environment, etc.) and analyzes the difficulty of finding and fixing them. The chapter emphasizes the importance of distinguishing between interface and implementation faults and discusses the underlying causes and possible preventative measures for each type of fault.

Key concept: A fundamental aspect in minimizing faults in software systems is the managing of complexity, the most critical of essential characteristics of software systems [Brooks 1995].

26. Novice Professionals: Recent Graduates in a First Software Engineering Job

This chapter explores the experiences and challenges faced by recent computer science graduates as they transition into their first software engineering jobs in industry. Based on direct observations and interviews of novice developers, the authors highlight the importance of “soft” skills, such as communication, collaboration, and navigating large code bases, and suggest ways to better prepare students for the realities of professional software development.

Key concept: They communicated in order to get help and to develop and confirm their understanding of how to do their job in relation to their team.

27. Mining Your Own Evidence

This chapter encourages software developers to “mine” their own evidence by leveraging the wealth of data available in software archives, such as version control systems, bug tracking databases, and execution logs. The author provides a step-by-step guide to mining these archives, highlighting the potential insights that can be gained and the challenges involved in analyzing and interpreting the data.

Key concept: Thus we recommend that before designing your own study, your first task should be to replicate a study that has already shown valid results.

28. Copy-Paste as a Principled Engineering Tool

This chapter challenges the prevailing notion that code cloning is always a bad practice, arguing that in certain situations, it can be a valuable and even principled design tool. The authors present a categorization of cloning patterns based on the motivations for creating clones and discuss the advantages, disadvantages, and long-term maintenance implications of each pattern.

Key concept: Every software developer knows that copy-paste—aka code cloning—is a bad habit, yet we all do it.

29. How Usable Are Your APIs?

This chapter explores the importance of API usability for improving the overall developer experience. The author argues that poorly designed APIs can significantly hinder developer productivity, even with the most advanced tools, and emphasizes the need for scenario-based design, which focuses on the tasks developers want to accomplish and their expected workflow.

Key concept: The end result is that developers often can’t figure out how to use the API, and instead of writing code and being productive, they end up spending their time browsing online forums, posting questions and reading answers.

30. What Does 10x Mean? Measuring Variations in Programmer Productivity

This chapter examines the implications of the well-established finding that individual programmer productivity can vary by an order of magnitude (the “10x” phenomenon). The author discusses the challenges in measuring programmer productivity accurately and proposes a multi-faceted approach that considers not just the volume of code produced, but also factors like code quality, impact on the team, and difficulty of tasks.

Key concept: One of the most replicated results in software engineering research is the 10-fold difference in productivity and quality between different programmers with the same levels of experience.

Essential Questions

1. What constitutes “convincing evidence” in software engineering, and how can we ensure that our findings have a real-world impact?

The book explores the multifaceted nature of convincing evidence in software engineering. It challenges the notion that elegance, statistical strength, and replicability alone are sufficient to make evidence truly impactful. Instead, it advocates for a more pragmatic approach, emphasizing the importance of evidence that motivates change and addresses the specific needs and concerns of the target audience. The authors stress the need for tuning evidence to the specific context and business bias of practitioners to ensure its relevance and impact on real-world practices.

2. How can we think critically about software engineering evidence to ensure its credibility and relevance to our specific context?

The book delves into the crucial role of critical thinking when evaluating software engineering evidence. It encourages readers to adopt a systematic approach to assessing the credibility and relevance of research findings, emphasizing the importance of questioning assumptions, considering potential biases, and triangulating evidence from multiple sources to arrive at a more comprehensive understanding of the phenomena under investigation.

3. How can we effectively aggregate evidence from multiple sources to arrive at a more comprehensive understanding of the effectiveness of different software development practices?

The book advocates for the adoption of systematic reviews (SRs) as a rigorous methodology for aggregating and analyzing evidence from multiple empirical studies to inform decision making in software engineering. By following a well-defined methodology, SRs help mitigate biases inherent in traditional literature reviews that rely solely on expert opinion or anecdotal evidence. This systematic approach ensures a more comprehensive and unbiased assessment of the effectiveness of different software development practices, guiding practitioners towards evidence-based decisions.

4. What role do qualitative methods play in understanding software engineering phenomena that are not easily quantifiable, and how can we effectively integrate qualitative and quantitative insights?

The book recognizes the value of qualitative research methods for understanding complex software engineering phenomena that are not easily quantifiable. It argues that while quantitative data is essential in many aspects of computing, it often fails to provide insights into the “why” and “how” questions that arise in software development. Qualitative methods, such as interviews, observations, and case studies, provide a rich understanding of the social, cultural, and human factors that influence software development practices.

5. How can we effectively learn from real-world experimentation and adopt an iterative approach to empirical research in software engineering, especially in the context of human-centric practices?

The book emphasizes the importance of real-world experimentation and the iterative nature of learning in software engineering. By drawing on the experiences and lessons learned from the NASA Software Engineering Laboratory (SEL), the authors advocate for an evolutionary, iterative approach to empirical research. They highlight the value of testing ideas in practice, gathering feedback, and adapting approaches based on real-world constraints and outcomes. This iterative process of learning and adaptation is essential for developing effective and context-aware software engineering practices.

Key Takeaways

1. Don’t rely solely on gut feeling or anecdotal evidence in software engineering; instead, make decisions based on evidence from various sources.

Relying solely on personal experience or assumptions can be detrimental in software engineering. By taking a data-driven approach and gathering evidence from diverse sources, we can gain a more comprehensive understanding of the problem domain, identify potential risks and opportunities, and make more informed decisions about the development process. Evidence helps us to challenge our assumptions, evaluate different approaches, and ensure that our solutions are grounded in reality.

Practical Application:

Suppose you’re leading a team developing a new AI-powered chatbot. Before starting to code, you could ask your team to gather evidence on similar chatbots. What technologies did they use? What were the challenges they faced? What were their success metrics? By analyzing this evidence, you can identify potential pitfalls, make more informed design decisions, and set realistic expectations for your project.

2. Context matters: the effectiveness of different software engineering approaches and the value of different metrics depend heavily on the specific context of the project.

There’s no “one size fits all” solution or metric in software engineering. The effectiveness of different approaches and the value of different metrics depend heavily on the specific context, the problem domain, the target audience, and the goals of the project. Blindly applying a method or using a metric without considering these factors can lead to misleading results and suboptimal outcomes.

Practical Application:

When evaluating an AI model for deployment, don’t rely solely on a single metric like accuracy. Consider other factors, like fairness, bias, explainability, and robustness. A model might be highly accurate but exhibit unacceptable bias, making it unsuitable for real-world use. Evaluating multiple aspects of the model’s performance provides a more comprehensive picture of its suitability for deployment.

3. The design of a software system tends to mirror the communication structure of the team that created it.

The design of a software system often reflects the communication structure of the organization that created it. This principle, known as “Conway’s Law,” highlights the strong interplay between social and technical aspects of software development. By understanding and leveraging this principle, we can improve team communication, reduce coordination overhead, and create software systems that better align with the organizational structure.

Practical Application:

In designing an AI-powered system, consider the communication structure of the team developing it. If you want to create a modular system with well-defined boundaries between components, ensure clear communication channels between the teams responsible for each component. Conversely, if you need tight integration between components, encourage close collaboration and communication between the respective teams. By aligning organizational structure with software architecture, you can promote better coordination, reduce integration issues, and enhance overall product quality.

4. Communication is not a distraction in software development; it is the lubricant that keeps the project moving smoothly.

Communication plays a vital role in software development. Developers spend a significant portion of their time communicating with colleagues, seeking information, clarifying requirements, discussing design choices, and resolving problems. Effective communication, both within and across teams, is essential for maintaining project awareness, coordinating activities, and ensuring a shared understanding of the project goals and design rationale.

Practical Application:

If you’re leading an AI development team, recognize that communication is a critical part of the job, not just a distraction. Encourage regular communication between team members, both formal and informal. Create a culture of information sharing and collaboration. Ensure that developers have easy access to the information they need, both through documentation and by readily being able to ask questions of their colleagues.

5. API usability is not just about documentation; it’s about creating an API that is easy to learn, use, and understand, minimizing the cognitive effort required for developers to integrate it into their applications.

API usability is crucial for developer productivity and satisfaction. A well-designed API allows developers to quickly understand its purpose, find the classes and methods they need, and integrate the API into their applications without having to spend excessive time on learning, debugging, and troubleshooting. Investing in API usability can yield significant returns in terms of developer productivity, reduced development time, and improved software quality.

Practical Application:

Suppose you’re developing an AI-powered recommendation system. Don’t just focus on improving the algorithms’ accuracy. Conduct usability studies to understand how developers will interact with your APIs. Make sure the class names, methods, and documentation are clear and concise and that the API maps well to developers’ mental models. A usable API is just as important as an accurate algorithm for achieving developer productivity and satisfaction.

Suggested Deep Dive

Chapter: Chapter 11: Conway’s Corollary

This chapter delves into Conway’s Law and provides compelling evidence from various studies, both in industry and open-source, to support the idea that organizational structure influences the design of software systems. Understanding this principle can be particularly useful for AI product engineers as they work within larger teams and companies, helping them design systems that are aligned with the communication structures and expertise distribution within their organizations.

Memorable Quotes

The State of Evidence Today. 13

“Results are far more convincing when they’re found again and again in many different contexts—i.e., not limited to one context or set of experimental conditions.”

Change We Can Believe In. 18

“A more feasible (if humble) definition is this: convincing evidence motivates change.”

Credibility, or Why Should I Insist on Being Convinced?. 31

“Software development is similar: evidence emerges over time, and the quality of the engineering hinges on the critical faculties of the engineer.”

Understanding Software Engineering Through Qualitative Methods. 84

“People trust numbers. They are the core of computation, the fundamentals of finance, and an essential part of human progress in the past century. And for the majority of modern societies, numbers are how we know things: we use them to study which drugs are safe, what policies work, and how our universe evolves.”

What Makes Software Engineering Uniquely Hard to Research. 97

“Software engineering has several characteristics that that distinguish it from other disciplines. Software is developed in the creative, intellectual sense, rather than being produced in the manufacturing sense.”

Comparative Analysis

Unlike other software engineering books that primarily focus on technical practices and methodologies, “Making Software” distinguishes itself by emphasizing the importance of evidence-based decision-making. It delves into the nuances of interpreting research findings, conducting empirical studies, and extracting valuable insights from diverse sources of evidence. While other books might provide prescriptive advice, “Making Software” encourages readers to develop critical thinking skills and adopt a more nuanced and context-aware approach to software development.

Reflection

Making Software offers a valuable and insightful exploration of the multifaceted nature of evidence in software engineering, highlighting its importance in shaping effective development practices. While it draws heavily on empirical research and case studies, it also recognizes the limitations of these methods and encourages a balanced approach that combines quantitative and qualitative insights. However, the book’s focus on specific case studies, primarily from large organizations like Microsoft, raises questions about the generalizability of its findings to other contexts, particularly smaller teams and organizations with different development processes. While the book’s insights are valuable, it is important for readers to critically evaluate the applicability of these findings to their specific situations and to consider the potential influence of various contextual factors.

Flashcards

What is a systematic review?

A research methodology for aggregating and analyzing evidence from multiple empirical studies.

How does an automated fault prediction system work?

It involves analyzing code properties, process properties, and fault counts from earlier releases to predict the expected number of defects in each file of a software system.

What does “architecting” refer to in the context of software development?

The overall set of concurrent frontend activities that are key to creating and sustaining a successful software product, including requirements analysis, architecture definition, planning, and risk resolution.

What is Conway’s Law?

The design of a system will mirror the communication structure of the organization that created it.

What is Test-Driven Development (TDD)?

A software development practice that involves writing test cases before writing the code.

What is Pair Programming?

A style of programming in which two programmers work side-by-side at one computer, continuously collaborating on the same design, algorithm, code, or test.

What is modularization?

The practice of breaking down a software system into smaller, self-contained modules or components.

What are design patterns?

Reusable solutions for common design problems in software development.

What is mining software archives?

The process of analyzing data from software repositories, such as version control systems and bug databases, to extract insights about the software development process.

What is API usability?

The degree to which an API is easy to learn, use, and understand.